Dynamic Policy Programming with Function Approximation

نویسندگان

  • Mohammad Gheshlaghi Azar
  • Vicenç Gómez
  • Hilbert J. Kappen
چکیده

In this paper, we consider the problem of planning in the infinite-horizon discountedreward Markov decision problems. We propose a novel iterative method, called dynamic policy programming (DPP), which updates the parametrized policy by a Bellmanlike iteration. For discrete state-action case, we establish L∞-norm loss bounds for the performance of the policy induced by DPP and prove that it asymptotically converges to the optimal policy. Then, we generalize our approach to large-scale (continuous) state-action problems using function approximation technique. We provide L∞norm performance-loss bounds for approximate DPP and compare these bounds with the standard results from approximate dynamic programming (ADP) showing that approximate DPP results in a tighter asymptotic bound than standard ADP methods. We also numerically compare the performance of DPP to other ADP and RL methods. We observe that approximate DPP asymptotically outperforms other methods on the mountain-car problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Dynamics Matrix of Alignment Process for a Gimbaled Inertial Navigation System Using Heuristic Dynamic Programming Method

In this paper, with the aim of estimating internal dynamics matrix of a gimbaled Inertial Navigation system (as a discrete Linear system), the discretetime Hamilton-Jacobi-Bellman (HJB) equation for optimal control has been extracted. Heuristic Dynamic Programming algorithm (HDP) for solving equation has been presented and then a neural network approximation for cost function and control input ...

متن کامل

A Multi-Stage Single-Machine Replacement Strategy Using Stochastic Dynamic Programming

In this paper, the single machine replacement problem is being modeled into the frameworks of stochastic dynamic programming and control threshold policy, where some properties of the optimal values of the control thresholds are derived. Using these properties and by minimizing a cost function, the optimal values of two control thresholds for the time between productions of two successive nonco...

متن کامل

An Optimal Tax Relief Policy with Aligning Markov Chain and Dynamic Programming Approach

Abstract In this paper, Markov chain and dynamic programming were used to represent a suitable pattern for tax relief and tax evasion decrease based on tax earnings in Iran from 2005 to 2009. Results, by applying this model, showed that tax evasion were 6714 billion Rials**. With 4% relief to tax payers and by calculating present value of the received tax, it was reduced to 3108 billion Rials. ...

متن کامل

An approximation algorithm and FPTAS for Tardy/Lost minimization with common due dates on a single machine

This paper addresses the Tardy/Lost penalty minimization with common due dates on a single machine. According to this performance measure, if the tardiness of a job exceeds a predefined value, the job will be lost and penalized by a fixed value. Initially, we present a 2-approximation algorithm and examine its worst case ratio bound. Then, a pseudo-polynomial dynamic programming algorithm is de...

متن کامل

Tuning Approximate Dynamic Programming Policies for Ambulance Redeployment via Direct Search

In this paper we consider approximate dynamic programming methods for ambulance redeployment. We first demonstrate through simple examples how typical value function fitting techniques, such as approximate policy iteration and linear programming, may not be able to locate a high-quality policy even when the value function approximation architecture is rich enough to provide the optimal policy. ...

متن کامل

Estimation and accuracy of the stationary solution in the dynamic programming problem: New results

In this paper we give explicit error bounds for approximations of the optimal policy function in the stochastic dynamic programming problem. The approximated policy function is obtained by using the Bellman equation with an approximated value function and the error bounds depend on the primitive data of the problem. Neither differentiability of the return function nor interiority of solutions i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011